RfaH Counter-Silences Inhibition of Transcript Elongation by H-NS–StpA Nucleoprotein Filaments in Pathogenic Escherichia coli

ABSTRACT Expression of virulence genes in pathogenic Escherichia coli is controlled in part by the transcription silencer H-NS and its paralogs (e.g., StpA), which sequester DNA in multi-kb nucleoprotein filaments to inhibit transcription initiation, elongation, or both. Some activators counter-silence initiation by displacing H-NS from promoters, but how H-NS inhibition of elongation is overcome is not understood. In uropathogenic E. coli (UPEC), elongation regulator RfaH aids expression of some H-NS-silenced pathogenicity operons (e.g., hlyCABD encoding hemolysin). RfaH associates with elongation complexes (ECs) via direct contacts to a transiently exposed, nontemplate DNA strand sequence called operon polarity suppressor (ops). RfaH–ops interactions establish long-lived RfaH–EC contacts that allow RfaH to recruit ribosomes to the nascent mRNA and to suppress transcriptional pausing and termination. Using ChIP-seq, we mapped the genome-scale distributions of RfaH, H-NS, StpA, RNA polymerase (RNAP), and σ70 in the UPEC strain CFT073. We identify eight RfaH-activated operons, all of which were bound by H-NS and StpA. Four are new additions to the RfaH regulon. Deletion of RfaH caused premature termination, whereas deletion of H-NS and StpA allowed elongation without RfaH. Thus, RfaH is an elongation counter-silencer of H-NS. Consistent with elongation counter-silencing, deletion of StpA alone decreased the effect of RfaH. StpA increases DNA bridging, which inhibits transcript elongation via topological constraints on RNAP. Residual RfaH effect when both H-NS and StpA were deleted was attributable to targeting of RfaH-regulated operons by a minor H-NS paralog, Hfp. These operons have evolved higher levels of H-NS–binding features, explaining minor-paralog targeting.

was collected by centrifugation at 8,000  g, 15 min, 4 °C. The pellet was resuspended in 35 mL of Heparin binding buffer A (50 mM Tris-HCl pH 7.5, 1 mM EDTA, 10% glycerol, 0.3 M NaCl, 0.5 mM DTT) and was loaded onto a HiTrap Heparin HP column attached to an AktaPure (GE Healthcare) at a 0.5 mL/min flow rate as described previously (7). The column was washed with 30 mL of Heparin binding buffer A and eluted with a 0-100% gradient of Heparin binding buffer B (50 mM Tris-HCl pH 7.5, 1 mM EDTA, 10% glycerol, 1 M NaCl, 0.5 mM DTT) over 40 minutes at 1 mL/minute. Fractions containing StpA were pooled and dialyzed overnight with 3,500 MWCO tubing in no-salt precipitation buffer. Precipitated StpA was collected by centrifugation at 8,000 x g, 10 min, 4 °C, and resuspended in StpA storage buffer (50 mM Tris-HCl pH 7.5, 1 mM EDTA, 5% glycerol, 0.3 M NaCl). StpA purity was confirmed by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and stored at -80 °C. StpA preps from multiple rounds of purification were pooled from overexpression in 12 L LB to generate 1.5 mg total of StpA for antiserum generation in rabbits by Covance (Denver, PA).
H-NS (Harlan, Final bleed, ID R10041, Cat POL001) and StpA (Covance, Final bleed, WI631) antisera were pre-cleared by adsorption to cell powders produced from appropriate CFT073 deletion strains to reduce cross-reactivity. Briefly, 1 L cultures of WAM5676 and WAM5677 were grown to an apparent OD600 of ~0.4 in MOPS rich-defined medium (RDM) + 0.2% glucose (11) (Teknova) and collected by centrifugation at 4,000  g, 15 min, 4 °C. The pellets were resuspended in 1 mL of cold 0.9% NaCl/1 g of pellet to induce gentle lysis for 5 min on ice. The solution was then mixed with a 4:1 ratio of acetone (chilled at -20 °C) to resuspension and then incubated on ice for 30 min with occasional vortexing. The acetone precipitate was collected by centrifugation at 10,000  g, 10 min, 4 °C. The pellet was resuspended with half the original volume of acetone, incubated for 10 min at 0 °C, and collected by centrifugation at 10,000  g, 10 min, 4 °C. The pellet was dried overnight at 0 °C into a powder. The powder was resuspended in 10 mL of 1X IP Buffer (100 mM Tris, pH 8.0, 300 mM NaCl, 1% TritonX-100, 1 mM PMSF). Antiserum was immunodepleted upon addition of 2 mL of H-NS antiserum, as used in (9,10) in WAM5676 and 2 mL of StpA antiserum in WAM5677.
The resuspension was rocked gently for 30 minutes at 4° C to allow collection of antiserum to non-target epitopes. The remaining non-bound antibodies were collected from the supernatant after centrifugation at 10,000  g, 10 min, 4 °C and stored at -80 °C. Specificity and success of purification of H-NS and StpA antisera was assessed by western blot on nitrocellulose membrane comparing antisera specificity before and after immunodepletion using whole-cell lysates of WAM4505 (WT), WAM5676 (Δhns), WAM5677 (ΔstpA), and ΔhnsΔstpA grown in MOPS RDM + 0.2% glucose and harvested at an apparent OD600 of ~0.4 (Fig. S1F, G).

Cell collection and crosslinking
Cells were harvested and processed for ChIP-seq essentially as previously described (12), with minor modifications. CFT073 strains were grown in MOPS RDM + 0.2% glucose. We used RDM because CFT073 ΔhnsΔstpA exhibited a prohibitively slow growth rate in MOPS minimal media. Strains were grown in 400 mL of MOPS RDM in 2 L Erlenmeyer flasks aerobically at 37 °C with shaking at 140 rpm until mid-exponential phase (apparent OD600 ~0.4). Crosslinking was then initiated by addition of 10.4 mL formaldehyde (37% in H2O) and 4 mL 1 M sodium phosphate, pH 7.6 without removing the culture from the shaker. Crosslinking was continued for 5 min at 37 °C with continuous shaking at 140 rpm. Crosslinking was then quenched by addition of glycine to 500 mM and placement of flasks in an ice-water slurry for 30 min with occasional rotation by hand. Cells were recovered by centrifugation at 3500  g, 10 min, 4 °C and washed 3 times with 800 mL of phosphate-buffered saline (PBS; 137 mM NaCl, 2.7 KCl, 10 mM Na2HPO4, 1.8 mM KH2PO4). The resulting pellet was resuspended in 8 mL PBS and transferred in eight 1 mL aliquots to 1.5 mL microfuge tubes. Cells were collected by centrifugation at 3500  g, 10 min, 4 °C, washed once with 1 mL PBS, and then centrifuged once more at 3500  g, 10 min, 4 °C. The resulting cell pellets were frozen and stored at -80° C. One cell pellet corresponding to 50 mL of original culture volume was used for each IP. Crosslinked cells from the same cultures were used for parallel IPs for each replicate (e.g., for H-NS, StpA, RNAP, RfaH, and σ 70 ).

Immunoprecipitation
Crosslinked cell pellets from 50 mL of culture were thawed and resuspended in 500 µL IP buffer (100 mM Tris, pH 8.0, 300 mM NaCl, 1% TritonX-100, 1 mM Pefabloc SC [Millipore Sigma]) and sonicated using a Misonix Ultrasonic Liquid Processor (Model No. S-4000) for 16-60 minutes at 60% power with 10-sec on-10-sec off cycles, to generate DNA fragments of ~300-500 bp. RNase A was then added to 2 µg/mL and the sonicates were incubated for 1 h at 4° C with rotation on an microfuge tube rotator at ~8 rotations per min. To generate IP input controls, 1/10 th (~50 µL) of the resulting lysate was removed to a separate tube. The resulting input control sample was incubated overnight at 4 °C with rotation on a microfuge tube rotator at ~8 rotations per min with 30 µL of magnetic beads (NEB) coated with Protein A or Protein G (for comparison to monoclonal antibody or polyclonal antibody IPs, respectively) and 700 µL of IP buffer. The remaining lysate was pre-cleared for 3 h at 4 °C by incubation with 30 µL of the appropriate Protein A or G magnetic beads to remove protein-DNA complexes that bound beads non-specifically and an addition of 250 µL IP buffer to retain beads in solution (750 µL total volume). The beads were removed using a magnetic microfuge-tube stand (NEB) and the precleared lysate supernatant was then incubated after addition of antibody (2 µL monoclonal antibodies for σ 70 and RNAP, 10 µL polyclonal antisera for RfaH, or 10 µL pre-adsorbed polyclonal antisera for H-NS and StpA) for 17-19 hours at 4 °C at ~8 rotations per min on a microfuge tube rotator. Antibody-protein-DNA complexes were then recovered using the magnetic stand and washed once with 1 mL LiCl solution (250 mM LiCl, 100 mM Tris-HCl pH 8, 2% TritonX-100), twice with 1 mL 600 mM NaCl solution (100 mM Tris-HCl pH 8, 600 mM NaCl, 2% TritonX-100), twice with 1 mL 300 mM NaCl (100 mM Tris-HCl pH 8, 300 mM NaCl, 2% TritonX-100), and twice with 1 mL TE (10 mM Tris-HCl pH 8, 1 mM EDTA).
Protein-DNA complexes were then eluted from beads by addition of 100 µL ChIP elution buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1% SDS) and incubation for 1 h at 65° C. The beads were then removed using the magnetic stand and crosslink reversal was completed by incubation for 18 h at 65° C. DNA was then recovered using Qiagen QIAquick PCR purification reagents and eluted in 58 µL of the Qiagen-supplied elution buffer. The DNA concentration was assessed using a QuBit dsDNA HS assay (ThermoFisher) on a QuBit 3.0 fluorimeter and then stored at -20 °C.
Input and no-antibody controls were generated similarly. Briefly, following overnight incubation with beads, ~750 µL of input supernatant was removed from no-antibody beads using a magnetic stand. Input supernatant was incubated at 65° C overnight after addition of 100 µL ChIP elution buffer to reverse protein-DNA crosslinks. Input DNA was checked on a 1.5% agarose DNA gel to confirm proper sonication and shearing (typically 200-1500 bp with median 200-600). The remaining beads ("no antibody") were processed and eluted as described for ChIP samples. No DNA was detected in the "no antibody" controls using the QuBit assay so only the ChIP and input samples were used to prepare libraries for DNA sequencing.
IPs, inputs, and no-antibody controls were assessed for enrichment using ChIP-qPCR on a QuantStudio3 using PowerUp SYBR master mix (Thermo Fisher) before library preparation.
Paired-end libraries for DNA sequencing were prepared using the NEBNext Ultra II DNA library reagents with NEBNext sample purification beads according to the manufacturer's protocols without size selection. Quality and size ranges of libraries was determined using HS DNA Screentape (Agilent) on a TapeStation 4200 system (Agilent). Libraries were sequenced on Illumina MiSeq or NovaSeq 6000 sequencers.

ChIP-seq analysis
All ChIP-seq analysis were performed using an open-source-software-based ChIP-seq pipeline (https://github.com/cmhustmyer/2022_hustmyer). The pipeline manages computational jobs using Snakemake (5.24.2) (13) and maintains the relationship between samples and their metadata using peppy (14). Paired-and single-end reads were trimmed of adapter sequences using CutAdapt version 2.10 (15) with parameters '-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT' for paired-end samples and '-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA' for single-end samples. Quality trimming was performed using trimmomatic (version 0.39) (16) with parameters "LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15" with the respective PE or SE mode for paired-end and single-end samples. The reads were aligned to the respective genome (Refseq NC_004431.1 for CFT073 or NC_000913.3 for RL3000) using bowtie2 (version 2.4.2) (16) with parameters "-end-to-end --very-sensitive --phred33". For deletion strains, reads were first aligned to the WT genome to confirm lack of read coverage over the deleted gene, and then aligned with each deleted gene masked. Read coverage in five bp bins was scaled to the median coverage over all bins for each sample (17). Read coverage was calculated using deepTools (version 3.5.0) (18) with the parameters "--binSize 5 --samFlagInclude 66 --extendReads" for paired end samples and " --binSize 5" for single end samples. Normalized coverage based on the input (IP/Input) or log2(IP/input) was calculated with custom python scripts. Read quality control was performed using fastqc version 0.11.8 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and multiqc version 1.11.0 (18).
RfaH-bound regions (Dataset 2) were identified using the following steps. First, enriched RfaH regions were called in WT, ΔrfaH, and ΔhnsΔstpA using macs2 (version 2.2.7.1) (19) with the parameter "--broad" using the corresponding input as a control sample. Each region was then verified in WT or ΔhnsΔstpA, using two criteria: an ops site was present within the region and the region was less than 1.5-fold enriched over background in ΔrfaH (i.e., the signal significantly decreased or was not present in the RfaH ΔrfaH control ChIP).
ChIP signal averages per gene, window, and TU were calculated using custom python scripts. Biological replicates with a Pearson correlation coefficient of least ~0.94 at IP/input signal per gene were used to calculate average ChIP signals after normalization. Gene annotations were parsed from GenBank NC_004431.1 and NC_000913.3 using custom python scripts. Genes for which reads were absent in input or IP for more than 20% of its length were excluded from analyses.
To check for any undesired genetic changes that might have arisen during cell growth, input sample sequences were compared to the NCBI reference NC_004431.1 sequence file for CFT073 (20) using breseq (21). We discovered many (>350) sequence differences that were common to all our CFT073 strains relative to the NC_004431.1 sequence (Dataset S1G). We suspect most if not all represent errors in the NCBI record. We found a small number (12) of sequence variations that were present in only one or a few of our CFT073 lineages or datasets (Dataset S1H). None of these 12 sequence differences affected genes encoding RNAP subunits, H-NS, StpA, RfaH, NusG or σ 70 .
Raw and processed reads from this work are deposited at GEO with accession number GSE212064. All input files, custom scripts, and pipeline definition code required to re-create the analysis described here is available at: https://github.com/cmhustmyer/2022_hustmyer. H-NS and StpA Myc-tagged ChIP-exo data (22) (GEO GSE181767) were also analyzed using this pipeline.

RNA-seq analysis to define transcription units
To define transcription units (TUs) in WT CFT073, we analyzed CFT073 RNA-seq data collected from cells grown in M9 minimal media supplemented with 0.2% glucose at midexponential phase (23) (GEO GSE122296, samples GSM3463591 and GSM3463592). TUs were inferred from RNA-seq data using Rockhopper (24). Transcript isoforms that were within a Hausdorff distance of 100 bp were merged into a single transcript with boundaries set to the maximum 5′ and 3′ ends of the set of similar transcripts. Because only WT RNA-seq data was available, TUs that we hypothesized are not transcribed in WT (e.g. wza) could not be defined using Rockhopper. Relevant TU boundaries and approximate TSSs from apparent TU boundaries and σ 70 peaks are noted in figure legends and in Dataset S2.

RNAP occupancy normalization
To compare RNAP occupancies across strains for which major genome-scale differences in transcription patterns are not expected (e.g,. to compare WT and ΔrfaH, where RfaH targets only 4 sites in the entire WT CFT073 genome), we scaled RNAP ChIP signals linearly between the maximal signal for bound RNAP (assigned as 1) and the background signal (assigned as 0) for genes at which RNAP was bound only non-specifically, as described previously for RNAP ChIPchip data (25) with minor modification. RNAP binds non-specifically to non-transcribed DNA (26), resulting in background ChIP signal (Fig. 1E). To define the background RNAP ChIP signal, we averaged ChIP signals for the 20 genes with lowest average RNAP ChIP signal in each replicate and assigned as 0 RNAP occupancy for that replicate. In our analyses of CFT073 RNAP ChIP-seq (Dataset S1), ~9% of coding genes were indistinguishable from this background signal. This untranscribed percentage of the CFT073 genome is similar to the 7% of genes found as RNAP background for MG1655 (25). Maximal RNAP occupancy occurs on rRNA and tRNA genes (27). Therefore, to assign the RNAP occupancy value of 1, we averaged the RNAP ChIP signals for 20 pre-determined tRNA genes, with had amongst the highest (top 50) average RNAP ChIP signal in each replicate and each strain (see https://github.com/cmhustmyer/2022_hustmyer). We excluded rRNA genes from this analysis due to complications mapping reads to the 7 homologous rRNA operons. This method of RNAP occupancy scaling allows us to compare RNAP occupancy across strains (e.g., WT vs. ΔrfaH, ΔstpA vs. ΔstpAΔrfaH, and ΔhnsΔstpA vs. ΔhnsΔstpAΔrfaH).

Traveling ratios
Traveling ratio (TR) calculations were performed as described previously (25,28), with minor modifications (Fig. 3F and Dataset S3). The TR for RfaH-regulated genes was defined as the ratio of average RNAP occupancy in a 300 bp window 5.4-5.8 kb downstream from the ops to the average RNAP occupancy 400-700 bp downstream from the ops. These windows were chosen to capture the RfaH-based effects on elongation. A window too near the TSS included signal from promoters that was convoluted with signal from elongating RNAP; 0.4 kb downstream of ops avoided signals from promoters so a meaningful elongation traveling ratio could be calculated. Four non-RfaH or H-NS regulated TUs were also analyzed as controls (Dataset S3, Fig. S4A-B). For the controls, the upstream 300-bp window was centered 0.3 kb from the TSS for each TU and the downstream window was either centered 5 kb from the 5′ 0.3 kb window or, if the TU was not long enough, the downstream-most 0.3 kb of the TU. RfaHdependent TRs (RTRs) were calculated by dividing the TR for a ∆rfaH strain by the TR for the corresponding rfaH + strain. All TR data are included in extended dataset S3.

H-NS and StpA occupancy normalization
To compare H-NS and StpA distributions in various strains, we used an occupancy normalization similar to the RNAP occupancy normalization. Each ChIP dataset was normalized to an appropriate input dataset (e.g., WT H-NS IP was normalized to WT input, H-NS IP in ΔstpA was normalized to the ΔstpA input). Average signals per gene or window were calculated using two or three biological replicates. These average signals were then linearly scaled between 0 and 1, where 0 was the average of signals for the 20 genes with lowest signal in the H-NS IP and 1 was the average of the 20 genes with highest signal for each IP. The average occupancy-normalized H-NS and StpA signals were then calculated per gene (Fig. 6) or per 1.4 kb window (Fig. S6).
The cut-off for H-NS bound genes in WT CFT073 was determined by an apparent inflection point in a cumulative plot of average H-NS ChIP-signal per gene (Fig. S1A). This yielded a cutoff of 0.07 times the maximal average H-NS ChIP signal per gene and the set of genes shown in Fig. 6C. To determine the subset of these genes scored as still bound in ΔrfaH, ΔstpA, Δhns, ΔhnsΔstpA strains (Dataset 1, Fig. 6C), we used a cut-off that was 0.07 times the maximal average ChIP signal per gene for that strain. For both WT and deletion strain determinations, the maximal average ChIP signal per gene was set as the average for the top 20 genes.

Relative ChIP occupancy for local transcription units
To compare ChIP signals for different targets within the same strain (e.g. Fig. 2, Fig. S3, Fig.   S5G-J), ChIP data were normalized to local maximal and minimal signals. For Fig. 2, raw ChIP signals (median-read normalized) from 1800 nt US and DS of 5′ and 3′ ends of predicted TSS were averaged across replicates. Each IP was then individually normalized within the TU window, using the average of the five lowest 5-bp window signals as background (set to 0) and the average of the five 5-bp window highest signals set to 1. Sequential rolling averages over 25 bp windows (at least 5 times) were then used to smooth signals, which also had the consequence of causing the highest peaks to be somewhat below 1. For RfaH/RNAP ratios in Figs. S5C-E and S5G-J, signal was log-scaled.
The ops sequence logo was generated using weblogo (https://weblogo.berkeley.edu). A search for ops motifs genome-wide in CFT073 was performed using FIMO motif scanning (https://meme-suite.org/meme/tools/fimo) using the ops sequence logo and a fasta sequence filed generated from GenBank NC_004431.1. Both strands were scanned using a match p-value of < 1E-6. All ops sites called are listed in Dataset 2 and cataloged by their strand orientation, genomic location, and whether RfaH signal was enriched over input using macs2.
Heat maps were generated using the Interactive CHM builder (MD Anderson Cancer Center, University of Texas) (35).

Quantitative western blots
Quantitative western blots were performed as previously described (26) with minor modifications using in vitro synthesized H-NS, StpA, and Hfp as standards.

Plasmid preparation for in vitro synthesis of H-NS, StpA, and Hfp
Expression plasmids for in vitro protein synthesis were constructed from synthetic DNA fragments (IDT) encoding a T7 RNAP promoter, CFT073 hns, stpA, or hfp, a C-terminal 3X FLAG epitope tag, and a 3′ stem-loop RNA structure (Tables S1, S2). DNAs were 5′phosphorylated using T4 polynucleotide kinase (NEB) and then blunt-end ligated into pSMART HCKan (Lucigen) to yield pCMH01-03 (Table S2). Several sequence modifications were made to aid protein expression. A 6X-His tag was added after the 3X FLAG tag using primers #14968 and #14969 by Q5 QuickChange (NEB) to generate pCMH04-06 (Table S2). Each coding sequence flanked by C-terminal 3X-FLAG and 6X His tags was PCR amplified using primers #15085 (H-NS), #15087 (StpA), and #15089 (Hfp) with #15086 to amplify the C-terminal His region (Table S1). A backbone fragment that contained a full T7 terminator was amplified from pT7_terminator (NEB , Table S2) using primers #15083 and #15084. The purified PCR products were Gibson assembled to generate pCMH008-010 (Table S2) that were used for in vitro protein synthesis.

In vitro protein synthesis for western blot analysis
C-terminal 3XFLAG and 6XHis-tagged H-NS, StpA, and Hfp were synthesized for western blot analysis using the PUREfrex 2.0 reconstituted cell-free protein synthesis reagents (Cosmo Bio USA) (36). pCMH08-10 (~30 ng) were used as templates. Proteins were synthesized for 4 hours at 37 °C following the manufacturer's protocol. Protein synthesis was confirmed using SDS-PAGE. Protein concentrations of in vitro synthesized proteins were estimated by Coomassie staining after SDS-PAGE compared to a purified H-NS standard of known concentration quantified using the QuBit protein assay (ThermoFisher) on a QuBit 3.0 fluorimeter (5).

Measurement of number of cells and total cellular protein
CFT073 or RL3000 strains were grown in MOPS rich-defined media. Cell pellets from one mL of culture were isolated by centrifugation at apparent OD600 ~0.4 and stored at -80 °C. Cell numbers were estimated from colony-forming units on LB plates incubated at 37 °C. To generate cell lysates, pellets were resuspended in 1 mL of 1X PBS, combined with 0.1 mL of 0.15% deoxycholic acid, and incubated at room temperature for 10 min. Trichloroacetic acid (TCA) (111 µL 50% w/v) was then added and the mixture was incubated on ice for 30 min. The precipitate was collected by centrifugation at 22,000  g, 30 min, 4 °C. The precipitate was resuspended in 60 µL a mixture of 0.1M Tris HCl (pH 6.8) and 4% SDS and held for 10 min at 100 °C. The Pierce BCA protein assay (Thermo Scientific) was used to generate a standard curve to estimate total cellular protein concentration according to the manufacturer's protocol. After combining with assay reagents, standards and unknown proteins were measured at OD562 on a Tecan M1000 Infinite plate reader.

Preparation of whole-cell extracts and western blots
Pellets were lysed and TCA precipitated as described above, centrifuged, then resuspended in 60 µL 2X SDS sample buffer (0.125 Tris-HCl, pH 6.8, 4% SDS, 20% glycerol, 0.02% bromophenol blue, 1.43 M ß-mercaptoethanol). Excess TCA was neutralized using puffs of NH3 vapor from a Pasteur pipette after which proteins were denatured by incubation for 10 min at 100 °C. To